Project: Build a Traffic Sign Recognition Classifier

this notebook is structured in 4 parts

  1. Load data, data exploration and visualization
  2. Deeplearning model design, training dataset preprocessing, balancing, augmentations and model training
  3. Performance evaluation and predictions
  4. (Optional) visualization of the network state

Part 1: load data, preprocessing, balancing and augmentations

Pipeline Flags for writeup images generation, fake data generation and training

Since some processes and functions require quite some calculation time, I set up some flags that can be set to enable/disable some specific steps. The flags are all set to True before the submission.

  1. plot_writeup_images: If the flag is set to True, many images are plotted, giving insights about the datasets but also slowing down the processing time
  2. generate_fake_data_for_augmentation: if true, unique fake images are generated
  3. train_model_flag: if true, start the model training
  4. process_own_images: if true, it runs the pre-processing pipeline I used for my own images

Load the dataset & check its integrity

The pickled data is a dictionary with 4 key/value pairs:


Dataset Summary & Exploration

Visualization of some image examples

There are three visualization functions.

  1. One 'plot_sample_of_each' shows a sublot of 15 (by detault) color images from the training dataset. I decided to show only one image per class. This method is used to produce a picture for the writeup.

  2. The method 'plot_sign_count_bar' takes as input the signnames and the number of signs to produce a sorted bar plot that shows the distribution of the different classes in terms of number of images. This method is useful to check how augmentation techniques shape the distribution of images. Whe I apply this function to the original datasets (train, valid and test) it shows that they all have a similar distribution among the classes and they are not balanced.

  3. the method 'plot_samples_from_all_classes' scrolls the entire datasets and it shows a few random images per each class (10 images per class by default). This method is useful to observe the datasets and to check the overall impact of the image filters.

Distribution of datasets


Part 2: Deeplearning model design, data preprocessing, balancing, augmentations and model training

I designed and implemented a deep learning model that learns to recognize traffic signs. The model is trained and testd on the German Traffic Sign Dataset. The model is also tested against some additional images that I personally took in Germany.

  1. The network model
  2. Conversion to greyscale
  3. Training dataset balancing
  4. Training dataset augmentations
  5. Datasets normalisation and distribution centering
  6. Model training

Model Architecture

The current model is a modified version of the the LeNet-5 implementation shown in the classroom.

The current implementation uses five hidden layers, three convolutional layers with increasing depth (I obtained the best results with depths=[32,64,128]) and two fully connected layers before the output layer.

To contrast the effects of overfitting, the model uses dropout for regularization only applied to the fully connected layers, that is layer 4 and layer 5 (and not to the convolutional layers). The best results (98.4% on validation accuracy and 96.1% on test accuracy) are obtained with p4=0.3 and p5=0.2

The network has a total number of trainable parameters equal to 244,851

Input

The network architecture accepts a 32x32x1 image as input. The images have a single channel (we assume grayscale images)

Architecture

Layer 1: Convolutional. with depth 32, strides 1x1 and padding 'valid'. The output shape is 30x30x32.

Activation. ReLU

Pooling. Max pooling with kernel size (2x2), padding='same' output shape is 15x15x32.

Layer 2: Convolutional. with depth 64, strides 1x1 and padding 'valid'. The output shape is 13x13x64.

Activation. ReLU

Pooling. Max pooling with kernel size (2x2), padding='same' output shape is 7x7x64.

Layer 3: Convolutional. with depth 128, strides 1x1 and padding 'valid'. The output shape is 5x5x128.

Activation. ReLU

Pooling. Max pooling with kernel size (2x2), padding='same' output shape is 3x3x128.

Flatten. Flatten the output shape of the final pooling layer results in 1152 elements.

Layer 4: Fully Connected. This should have 120 outputs.

Activation. ReLU

Regularizatin. Dropout with keep probability of 0.3

Layer 5: Fully Connected. This should have 84 outputs.

Activation. ReLU

Regularizatin. Dropout with keep probability of 0.2

Layer 5-out: Fully Connected (Logits). 43 outputs

Activation. Softmax

Output

Return the result of the 2nd fully connected layer.

Pre-process the Data Set

As suggested in the paper from Sermanet et al. I applied several image pre-processing techniques to improve the model performance. Some of the filters and techniques are also used to generate fake data that helps contrasting model overfitting.

  1. Normalisation and centering
  2. Conversion to greyscale
  3. Image cropping and padding
  4. Image scaling
  5. Translate(img,translate_dict):
  6. Image rotatation
  7. Image shearing
  8. Histogram equalization (CLAHE)

Normalisation and centering

Writeup image functions to show image augmentation filters

Apply equalization and gray filters

Apply histogram equalization to all datasets

dataset balancing and augmentation using imaug library

Generate fake data to balance all classes and increase the training dataset size

apply normalisation and centering to gray images

Loading pickle files with pre-processed datasets

Select the dataset to use for training and model parameters

define log folders and callback functions for model training

Train the model, check validation accuracy and test the results against the test datases

Use Tensorboard to


Part 3: Test a Model on New Images

To give yourself more insight into how your model is working, download at least five pictures of German traffic signs from the web and use your model to predict the traffic sign type.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Load and Output the Images

load own images from pickle file

Predict the Sign Type for Each Image

Analyze Performance

Output Top 5 Softmax Probabilities For Each Additional Image I took

Project Writeup

Once you have completed the code implementation, document your results in a project writeup using this template as a guide. The writeup can be in a markdown or pdf file.

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.


Part 4 (Optional): Visualize the Neural Network's State with Test Images

This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.

Provided for you below is the function code that allows you to get the visualization output of any tensorflow weight layer you want. The inputs to the function should be a stimuli image, one used during training or a new one you provided, and then the tensorflow variable name that represents the layer's state during the training process, for instance if you wanted to see what the LeNet lab's feature maps looked like for it's second convolutional layer you could enter conv2 as the tf_activation variable.

For an example of what feature map outputs look like, check out NVIDIA's results in their paper End-to-End Deep Learning for Self-Driving Cars in the section Visualization of internal CNN State. NVIDIA was able to show that their network's inner weights had high activations to road boundary lines by comparing feature maps from an image with a clear path to one without. Try experimenting with a similar test to show that your trained network's weights are looking for interesting features, whether it's looking at differences in feature maps from images with or without a sign, or even what feature maps look like in a trained network vs a completely untrained one on the same sign image.

Combined Image

Your output should look something like this (above)